DPTree: A Distributed Pattern Tree Index for Partial-Match Queries in Peer-to-Peer Networks

نویسندگان

  • Dyce Jing Zhao
  • Dik Lun Lee
  • Qiong Luo
چکیده

Partial-match queries return data items that contain a subset of the query keywords and order the results based on the statistical properties of the matched keywords. They are essential for information retrieval on large document repositories. However, most current peer-topeer networks for information retrieval are based on distributed hashing and as such cannot support partial-match queries efficiently. In this paper, we describe an efficient and scalable technique to support partialmatch queries on peer-to-peer networks. We observe that the combinations of keywords in the queries are only a small subset of all possible combinations of the keywords in the documents. Therefore, we propose a distributed index structure, called a distributed pattern tree (DPTree), to record frequent query patterns, i.e., combinations of keywords, learnt from the query history at each node in the network. Using this index, a query can identify its best matching patterns quickly and data lookup can be done in logarithmic time with respect to the network size. Our simulation studies on the TREC data sets have shown promising results in comparison with other previous approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Associative Search in Peer to Peer Networks: Harnessing Latent Semantics

The success of a P2P file-sharing network highly depends on the scalability and versatility of its search mechanism. Two particularly desirable search features are scope (ability to find infrequent items) and support for partial-match queries (queries that contain typos or include a subset of keywords). While centralized-index architectures (such as Napster) can support both these features, exi...

متن کامل

Associative search in peer to peer networks : Harnessing latent semantics q

The success of a P2P file-sharing network highly depends on the scalability and versatility of its search mechanism. Two particularly desirable search features are scope (ability to find infrequent items) and support for partial-match queries (queries that contain typos or include a subset of keywords). While centralized-index architectures (such as Napster) can support both these features, exi...

متن کامل

Indexing Distributed Complex Data for Complex Queries

Peer-to-peer networks are becoming a common form of online data exchange. Querying data, mostly files, using keywords on peer-to-peer networks is well-known. But users cannot perform many types of queries on complex data and on many of the attributes of the data on such networks other than mostly exact-match queries. We introduce a distributed hashing-based index for enabling more powerful acce...

متن کامل

A Tabu-Based Cache to Improve Range Queries on Prefix Trees

Distributed Hash Tables (DHTs) provide the substrate to build large scale distributed applications over Peerto-Peer networks. A major limitation of DHTs is that they only support exact-match queries. In order to offer range queries over a DHT it is necessary to build additional indexing structures. Prefix-based indexes, such as Prefix Hash Tree (PHT), are interesting approaches for building dis...

متن کامل

Range queries over skip tree graphs

The support for complex queries, such as range, prefix and aggregation queries, over structured peer-to-peer systems is currently an active and significant topic of research. This paper demonstrates how Skip Tree Graph, as a novel structure, presents an efficient solution to that problem area through provision of a distributed search tree functionality on decentralised and dynamic environments....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006